Ambiguous Partially Observable Markov Decision Processes: Structural Results and Applications

نویسنده

Soroush Saghafian

چکیده

Markov Decision Processes (MDPs) and their generalization, Partially Observable MDPs (POMDPs), have been widely studied and used as invaluable tools in dynamic stochastic decision-making. However, two major barriers have limited their application for problems arising in various practical settings: (a) computational challenges for problems with large state or action spaces, and (b) ambiguity in transition probabilities, which are typically hard to quantify. While several solutions for the first challenge, known as “curse of dimensionality,” have been proposed, the second challenge remains unsolved and even untouched in the case of POMDPs. We refer to the second challenge as the “curse of ambiguity,” and address it by developing a generalization of POMDPs termed Ambiguous POMDPs (APOMDPs). The proposed generalization not only allows the decision maker to take into account imperfect state information, but also tackles the inevitable ambiguity with respect to the correct probabilistic model. Importantly, this paper extends various structural results from POMDPs to APOMDPs. Such structural results can guide the decision maker to make robust decisions when facing model ambiguity. Robustness is achieved by using α-maximin expected utility (α-MEU), which (a) differentiates between ambiguity and ambiguity attitude, (b) avoids the over conservativeness of traditional maximin approaches widely used in robust optimization, and (c) is found to be suitable in laboratory experiments in various choice behaviors including those in portfolio selection. The structural results provided also help to handle the “curse of dimensionality,” since they significantly simplify the search for an optimal policy. Furthermore, we provide an analytical performance guarantee for the APOMDP approach by developing a bound for its maximum reward loss due to model ambiguity. To generate further insights into how APOMDPs can help to make better decisions, we also discuss specific applications of APOMDPs including machine replacement, medical decision-making, inventory control, revenue management, optimal search, sequential design of experiments, bandit problems, and dynamic principal-agent models.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Ambiguous POMDPs: Structural Results and Applications

متن کامل

Transition Entropy in Partially Observable Markov Decision Processes

This paper proposes a new heuristic algorithm suitable for real-time applications using partially observable Markov decision processes (POMDP). The algorithm is based in a reward shaping strategy which includes entropy information in the reward structure of a fully observable Markov decision process (MDP). This strategy, as illustrated by the presented results, exhibits near-optimal performance...

متن کامل

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Maintenance can be the factor of either increasing or decreasing system's availability, so it is valuable work to evaluate a maintenance policy from cost and availability point of view, simultaneously and according to decision maker's priorities. This study proposes a Partially Observable Markov Decision Process (POMDP) framework for a partially observable and stochastically deteriorating syste...

متن کامل

Spoken Dialogue Management Using Probabilistic Reasoning

Spoken dialogue managers have benefited from using stochastic planners such as Markov Decision Processes (MDPs). However, so far, MDPs do not handle well noisy and ambiguous speech utterances. We use a Partially Observable Markov Decision Process (POMDP)-style approach to generate dialogue strategies by inverting the notion of dialogue state; the state represents the user’s intentions, rather t...

متن کامل

The Complexity of Deterministically Observable Finite-Horizon Markov Decision Processes

We consider the complexity of the decision problem for diierent types of partially-observable Markov decision processes (MDPs): given an MDP, does there exist a policy with performance > 0? Lower and upper bounds on the complexity of the decision problems are shown in terms of completeness for NL, P, NP, PSPACE, EXP, NEXP or EXPSPACE, dependent on the type of the Markov decision process. For se...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2015

Ambiguous Partially Observable Markov Decision Processes: Structural Results and Applications

نویسنده

چکیده

منابع مشابه

Ambiguous POMDPs: Structural Results and Applications

Transition Entropy in Partially Observable Markov Decision Processes

A POMDP Framework to Find Optimal Inspection and Maintenance Policies via Availability and Profit Maximization for Manufacturing Systems

Spoken Dialogue Management Using Probabilistic Reasoning

The Complexity of Deterministically Observable Finite-Horizon Markov Decision Processes

عنوان ژورنال:

اشتراک گذاری